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Abstract. Using recent results on singularity analysis for Hadamard prod- 
ucts of generating functions, we obtain the limiting distributions for additive 
functionals on m-ary search trees on n keys with toll sequence (i) n a with 
a > (a = and a = 1 correspond roughly to the space requirement and 
total path length, respectively); (ii) In ( which corresponds to the so- 

called shape functional; and (iii) l n=m _i, which corresponds to the number 
of leaves. 

1. Introduction 

We begin by providing a brief overview of m-ary search trees. For integer m > 2, 
the m-ary search tree, or multiway tree, generalizes the binary search tree. The 
quantity m is called the branching factor. According to |17j . search trees with 
branching factors higher than 2 were first suggested by Muntz and Uzgalis |2H] 
"to solve internal memory problems with large quantities of data." For further 
background we refer the reader to El and • 

We consider the space of m-ary search trees on n keys, and assume that the keys 
can be linearly ordered. Since we shall be concerned only with the structure of the 
tree and not its specific contents, we can then without loss of generality take the 
set of keys to be [n] :— {1, 2, . . . , n}. An m-ary search tree can be constructed from 
a sequence s of n distinct keys in the following way: 

(a) If n < m, then all the keys are stored in the root node in increasing order. 

(b) If n > m, then the first m — 1 keys in the sequence are stored in the root 
in increasing order, and the remaining n — (m — 1) keys are stored in the 
m subtrees subject to the condition that if K\ < K2 < ■ ■ ■ < K m -i denotes the 
ordered sequence of keys in the root, then the keys in the jth subtree are those 
that lie between Kj-i and kj, where kq :— and n m := 11 + 1, sequenced as 
in s. 

(c) Recursively, all the subtrees are m-ary search trees that satisfy conditions Ij^jl. ifhjl. 
and @. 
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In this work we consider additive functionals on m-ary search trees, as we describe 
next. 

Fix m > 2. Given an m-ary search tree T, let Li(T), . . . , L m (T) denote the 
subtrees rooted at the children of the root of T. The size \T\ of a tree T is the 
number of keys in it. We will call a functional / on m-ary search trees additive if 
it satisfies the recurrence 



for any tree T with \T\ > m — 1. Here (b n ) n >m-i is a given sequence, henceforth 
called the toll sequence or toll function. Note that the recurrence does not 

make any reference to b n for < n < m — 2 nor specify the initial conditions /(T) 
for < \T\ < m - 2. 

Several interesting examples can be cast as additive functionals. 

Example 1.1. If we specify f(T) arbitrarily for < \T\ < m — 2 and take b n = 
c for n > m — 1, we obtain the "additive functional" framework of |17l §3.1]. 
(Our definition of an additive functional substantially generalizes this notion.) In 
particular if we define /(0) := and f(T) := 1 for the unique m-ary search tree T 
on n keys for 1 < n < m — 2 and let b n = 1 for n > m — 1, then f(T) counts the 
number of nodes in T and thus gives the space requirement functional discussed 
in Q7| §3.4]. 

Example 1.2. If we define f(T) := when |T| = 0, f(T) := 1 when 1 < \T\ < m-2, 
and b n := l„ =m _i, then / is the number of leaves in the m-ary search tree. 

Example 1.3. If we define f(T) := when < \T\ < m - 2 and b n :=n-[m- 1) 
for n > m — 1, then / is the internal path length functional discussed in j!7l §3.5]: 
f(T) is the sum of all root-to-key distances in T. 

In this work we choose to treat explicitly the toll n, rather than n — (m — 1). 
However our techniques reveal that the lead-order asymptotics of moments and the 
limiting distributions of these two additive functionals are the same. 

Example 1.4. As described above, each permutation of [n] gives rise to an m-ary 
search tree. Suppose we place the uniform distribution on such permutations. This 
induces a distribution on m-ary search trees called the random permutation model. 
Denote its probability mass function by Q. Dobrow and Fill noted that 



where the product in l|1.2l) is over all nodes in T that contain m — 1 keys. This 
functional is sometimes called the "shape functional" as it serves as a crude measure 
of the "shape" of the tree, with "full" trees (such as the complete tree) achieving 
the larger values of Q. For further discussions along these lines, consult and [S]. 
If we define f(T) := for < \T\ < m - 2 and b n := In ( m ™ J for n > m - 1, then 
f(T) = — In Q(T). Henceforth throughout this paper we will refer to — InQ (rather 
than Q) as the shape functional. 

Several authors ^3^3121 IS] have studied additive functionals under the random 
permutation model. Clearly the random permutation model does not induce the 
uniform distribution on m-ary search trees with n keys since different permutations 



n i 
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can give rise to the same tree. In this paper we consider additive functionals under 
the uniform model, i.e., when each tree on n keys is considered equally likely. The 
shape functional for the case to = 2 (uniformly distributed binary search trees) 
was considered by Fill [5], who derived (limited) asymptotic information about 
its mean and variance. Limiting distributions for the shape functional and other 
additive functionals treated in the present paper were identified in [Hj for to = 2. 
We now generalize these results to include all values of to. What makes the analysis 
for general to significantly more intricate is that several key quantities (such as the 
number p discussed at the beginning of Section [3J are for general to known only 
implicitly. 

One motivation for the present paper can be understood in the context of the 
shape functional. The probability mass function Q corresponding to the random 
permutation model (a reasonably realistic model in practice) is an object of natural 
interest. Dobrow and Fill j2j determined the smallest and largest values of Q; 
but what are "typical" values? We can study this question probabilistically by 
placing a distribution on T and considering the distribution of Q(T). Two rather 
natural choices for this distribution are Q itself (as treated in 0) and the uniform 
distribution on trees (as treated herein). 

We follow the "repertoire" approach of Greene and Knuth J3j , determining the 
effect of a family of basic tolls (for example, those of the form n a ). Then the effect 
of a new toll could be determined by expressing it in terms of the basic tolls. 

For tolls of the form n a with a > and the tolls In ( ™i) and l n=m _i, we 
determine asymptotics of moments of all orders and our main results (Theorems l4.5l 
14.61 15.21 16.21 and I7.2|l use these to yield limiting distributions. Here, in broad 
terms for the toll n a , is a summary of lead-order results under both the random 
permutation model and the uniform model: 





Model 




Toll function n a 


Random permutation 


Uniform 


a smaller than 1/2 


n 


n 


a between 1/2 and 1 


n 




a bigger than 1 


n a 


a + i 

n +2 



Table 1. Order of magnitude of the additive functional corre- 
sponding to the toll n a . 

It is not surprising that the orders of magnitude under the uniform model are at 
least as large as under the random permutation model. Indeed, it is well known 
that trees produced by the uniform model arc generally much "stringier" than trees 
produced by the random permutation model; for example, height is of order ^/n 
under the uniform model and order logn under the random permutation model. 
Furthermore "stringy" trees tend to give large values of the functional. 

Qualitatively the uniform model differs significantly from the random permuta- 
tion model, where, for example, there is a "phase change" in the limiting behavior 
at to = 26 from asymptotic normality to non-existence of a limiting distribution, 
for any toll whose order of growth does not exceed n 1 / 2 ; see 9 for precise results. 
On the other hand, for all to the uniform model leads to the normal distribution for 
the shape functional, space requirement, and number of leaves, and to (apparently) 
non-normal distributions for tolls of the form n a with a > 0. 
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We use methods from analytic combinatorics, in particular singularity analysis 
of generating functions to derive the asymptotics of moments of the functional 
under consideration and then the method of moments to characterize the limit- 
ing distribution. A key singularity analysis tool is the newly-developed "Zigzag 
algorithm" [7j to handle Hadamard products of generating functions. 

The limiting distributions (and even local limit theorems) for the space require- 
ment and the number of leaves presumably can also be derived using Theorem 2 
of 0] since the bivariate generating function for these parameters satisfy suitable 
functional equations. (This is not the case for the other tolls that we consider.) We 
include our proofs of these results for completeness and uniformity of treatment of 
tolls. 

The paper is organized as follows. In Section we set up the problem using 
generating functions. In Section a singular expansion for the generating func- 
tion of the number of m-ary search trees on n keys is obtained. Sections ^ \5\ El 
andQderive limiting distributions for the additive functionals corresponding to the 
tolls n a (a > 0), ln( rn ™ 1 ) (shape functional), 1 (space requirement), and l„ =m _i 
(number of leaves), respectively. 

Notation. Throughout, we will use \z n ]f(z) to denote the coefficient of z n in the 
Taylor series expansion of f(z) around z — 0. We use C{Y) to denote the law (or 

distribution) of a random variable Y , the symbol = to denote equality in law, and 

— > to denote convergence in law. We denote the (univariate) normal distribution 
with mean /i and variance a 2 by iV(/i, a 2 ). 

2. Preliminaries 

Our starting point is the recursive construction of m-ary search trees. Let X n = 
X n {T) denote an additive functional on a random m-ary search tree T on n keys. 
Let J = ( Ji, . . . , J m ) be the (random) vector of sizes of the subtrees rooted at the 
children of the root of T. If T is a uniformly distributed m-ary search tree on 
n keys, then X n satisfies the distributional recurrence 

m 

(2.1) X n = X ( j k J +b n , n > m - 1, 

with (Xq, . . . , X m -2) ='■ x denoting the vector of deterministic values of the func- 
tional for trees with fewer than m — 1 keys. The sequence (6„)„> m _i is called the 

toll sequence. In (|2.1(l . = denotes equality in law (i.e., in distribution), and on the 
right, 

• for each k = 1, . . . , m, we have X^ = X,-; 

• the quantities J; Xn , ■ ■ ■ , X^\ ,s: X^\ . . . , X^ 2 \ .,: .. .; 

X^, . . . , _ 1 j are all independent; 

• the distribution of J if given by 

(2.2) P[J 1 =j 1 ,...,J m = j m ] = Tjl ' ' ' Tjm , 

for (ji, . . ., j m ) > with ji H h j m = n - (m - 1), where r fc = r fe (m) is 

the number of m-ary search trees on k keys. 

(Throughout we will take m > 2 to be fixed and so will suppress the dependence 
of various parameters on m.) 
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[si 

Denote the sth moment of X n by /i n '■= E A^. Now taking the sth power 
of i|2.1[l and conditioning on (Ji, . . . , J m ) gives 



E 



soH hs m =s 



•SO 



j . . . j o m 



denotes the sum over all m-tuples (ji, . . . , j m ) > such that 521= 1 -h ~ 
n — (m — 1). Isolating the terms in the sum where s$ = s for some i € [m], we get 



E 



E 



s H hs 7T1 — s 

n— (m — 1) 



•so 



31 



) • * • ! °m 



&*°E* 



(2.3) 

where 
(2.4) 



E E 



J'i=0 



E 



J2H hjm=n-(m-l)-j'i 



SoH hs m =s 

si ,...,S TTl <S 



x — 

&»°e 



T nl' n •••-•/', 



yt>0 ) • * * j / 

Introduce generating functions 

oo oo oo 

M W(z) := X>/4 sI 2 n , rW W : = E^*"' r W : = E T " 



n=0 



n=0 



n=0 



Multiplying l|2.3|l by z n and summing over n > m — 1 yields (observe that tq 



r m _2 = 1 and r, 



s] 



.[«] 

' m-2 



0) 



m-2 



M W (2) - ^ x]z 3 ' = mz m - 1 fi^(z)T m - 1 (z) + r^(z), 

3=0 



so that 
(2.5) 

Furthermore 

rW(z) = 



(2.6) 



where 



E 

•"+s r . 

■ ■ -,Sm 

E 



SoH hSm-S 

Si,., . ,S m <S 



so 



soH hs m =s 

si ,. . . ,s m <s 



SO 



1 - ■m\zT(z)] m - 1 



71=0 

6 Oso (z)0 (z m -V [sil (^)---^ [Sml W) , 



j . . . j o m 



j * * • j 



b(z) 



n=0 
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and f(z) g(z) = (/ © g)(z) is the Hadamard product of the power series / and g. 
Note that since [z n ] (z" 1-1 /!^ 1 ] (z) ■ ■ ■ p^ Sm \z)^ — for < n < m — 2 we may 
instead use 

oo 

b(z):= 6 " z " 

n— m— 1 

when convenient. 

3. Singular expansions 

We will employ singularity analysis [111 1101 Ej to derive asymptotics of p^} us- 
ing (|2.5(l . In order to do so we need a singular expansion for r(z) around its 
dominant singularity. We will use the theory of analytic continuation of algebraic 
functions (see, for example, ^0 §111.45] or 12, §VII.4]) to derive such an expansion. 
The terminology used is from ^1 §VII.4]. 

Before we begin, we note that Fill and Dobrow [H] were able to use large- 
deviations techniques to obtain lead-order asymptotics of r„. However their tech- 
niques do not seem to be sufficient to derive the higher-order results we will need. 

We now proceed with our analytic approach. As observed by Fill and Dobrow 6 , 
it follows from the recursive definition of m-ary search trees that 

m-2 

(3.1) t(*)-XV =* m -V"(*). 

Thus t(z) is an algebraic series satisfying P(z, r(z)) = 0, where 

m-2 



(3.2) P(z,w) := z rn - 1 w rn - w+ z ° ■ 

The exceptional set of P [excluding z = 0, at which r(z) clearly has no singularity] 



is 



|J jz: P(z,w) = and ^-P(z,w) = oj 



U{ 



m-2 



z: z m - x w m - w 



Y z j = and mizw)™- 1 - 1 = oj 

j=0 




m—1 

= <| z: m m I z 3 \ = (m- l) m " 

The singularities of t(z) lie in the exceptional set. It is clear [S] Theorem 3.1] 
that there exists a unique p 6 (0, 1) contained in this set. Furthermore, since the 
Taylor coefficients of r(z) are nonnegative, by Pringsheim's theorem 19, Theo- 
rem 1.17.13], p is a dominant singularity of r(z). It is straightforward to check that 
the polynomial system given by writing P(z,w) — in the form w = <&(z,w) is 
a-proper, a-positive, a-irreducible, and a-aperiodic (cf. §VII.4.2]), so that by 
Theorem VII. 7 of |12| we have that p is the unique dominant singularity and as 
z-ipa singular expansion of the form 

(3.3) T{z)~Y,ai{l-p- l zf/' 2 . 

l>0 
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Remark 3.1. Singularity analysis immediately yields from 1)3.3(1 a complete asymp- 
totic expansion for r„, the number of m-ary search trees on n keys: 



(3-4) r n ~p-"£ 



«2Z+1 -l- 

-n 



t > r H-^) 

In particular, 

T n = [l + 0(n- 1 )]^n-V*p-». 
3.1. Determination of the coefficients a/. Define w p := ^35-^^0 ^' so ^ na ^ 



= 0. 



Using the definition of p and the fact that w p > by definition, we have w p = 
m~ ™-i p -1 . Now J^-P(p,w) is negative, zero, or positive as w > is less than, 
equal to, or greater than w p . Hence, for w > 0, P(p, w) = if and only if u> = 
But ao > and = P(p, t(p)) = P(p, 00), so that 

1 _-. 

(3.5) ao = w p = m m ~ 1 p 

To obtain values of a; for Z > 1, we rewrite 1(3.111 for z 7^ 1 as 

1 _ _m— 1 

and, then defining Z := 1 — equivalently as 

(3.6) 1 + p^-^l - Z) m ^ [{l-p + pZ)r m {z) - 1] - (1 - p + pZ)r(z). 

By comparing the coefficients of Z in this equation and observing that a\ < we 
obtain 



(3.7) ai = -v2mQ*m _ ~ l o _1 , 
where, matching the notation of [5], we define the key quantity 

(3.8) a* :=m- (m^ - l) - l) -1 . 

In the sequel we will also need the following relation, which follows from comparing 
coefficients of Z 3 / 2 in @3} : 

/„ n x ao(ao - a 2 ) _ m - 2 

(3 - 9) a| " "e - • 

Let A denote a generic (formal) power series in Z, possibly different at each 
appearance. Similarly, let Vd denote a generic polynomial in Z of degree at most d. 
In the sequel we will likewise use J\f to denote a generic (formal) power series in 
powers of Then, using ((3.3(1 and ((3.5(1 . we have 

(3.10) (1 - mizTiz)}™- 1 )- 1 ,°° ^ Z- 1 / 2 + c + Z^ 2 A + ZA, 

— ai(m — 1) 

where, using 1(3. 9|) . we have 

771—2 

(3.11) c 



3(m- 1)' 

(3.12) z m - 1 r m (z) - aom- 1 + aiZ 1/2 + Z.4 + Z 3/ M; 
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and 

rn — 2 m — 2 

(3.13) y, = E + xv »- ■■■■ 

3=0 3=0 

Thus, by singularity analysis, 

(3.14) [z n ] [z m - 1 r m (z)] ~ n- 3 / 2 p- n + n-'j^j . 

3.2. Generalized polylogarithms. For a an arbitrary complex number and r 
a nonnegative integer, the generalized polylogarithm function hi a , r is defined for 
M<lby 

n=l 

We record here three singular expansions (as z — > p) that are computed using 
singularity analysis of generalized polylogarithms jl()| : 

Li3/2,o(p -1 «) ~ C(3/2) - 2^Z^ 2 + ZA + Z 3 / 2 A, 
Li 3 /2,i(p _1 ^) ~ -C'(3/2) - 2^Z l/2 InZ- 1 - 2y/n[2(l - In 2) - 7 ]Z 1/2 
(3 ' 15) + ZA + (Z^ 2 logZ)A + Z 3 / 2 A, 

Lii^p^z) - ^ln 2 ^ -1 -Tln^ -1 +A+ (ZlogZ)A. 

These expansions will be utilized in the analysis of the shape functional in Section[S] 

3.3. Zigzag algorithm. For the reader's convenience we present the Zigzag al- 
gorithm, which is used extensively in the rest of this paper to determine singular 
expansions of Hadamard products. The validity of the algorithm was established 
recently in JJj, to which the reader is referred for further background discussion. 

"Zigzag" Algorithm. [Computes the singular expansion of / g up to 
0(\l~z\ c ). ] 

1. Use singularity analysis to determine separately the asymptotic expan- 
sions of /„ = [z n ]f(z) and g n — [z n ]g(z) into descending powers of n. 

2. Multiply the resulting expansions and reorganize to obtain an asymp- 
totic expansion for the product f n 9n- 

3. Choose a basis B of singular functions, for instance, the standard ba- 
sis B = {(1 — z) /3 [ln(l — z)]* 1 }, or the polylogarithm basis B — {Lhj^z)}. 
Construct a function H(z) expressed in terms of B whose singular behavior 
is such that the asymptotic form of its coefficients h n is compatible with 
that of fn9n up to the needed error terms. 

4. Output the singular expansion of / g as the quantity H(z) + P{z) + 
0(| 1 — z\ c ), where P is a polynomial in (1 — z) of degree less than C. 

The reason for the addition of a polynomial in Step 4 is that integral powers of (1— z) 
do not leave a trace in coefficient asymptotics since their contribution is asymptoti- 
cally null. The Zigzag Algorithm is principally useful for determining the divergent 
part of expansions. If needed, the coefficients in the polynomial P can be expressed 
as values of the function / g and its derivatives at 1 once it has been stripped of 
its nondiffcrentiable terms. 
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4. The toll n a 

The main theorems of this section are Theorems 14 . 51 and 14 . 61 which give limiting 
distributions for the additive functionals corresponding to the tolls n a with a > 0. 
Although the normalization required to produce a limiting distribution depends 
on m, our results exhibit a striking invariance principle: The limiting distributions 
themselves do not depend on the value of m (and thus in particular, have already 
arisen when m = 2 in py). 

4.1. Mean. We consider the mean for the toll b n = n a , where a > 0. Using s = 1 
in ((2.6(1 we have 

rW(z) = b(z)&[z m - 1 T m (z)] , 
and consequently, by 12.4(1 and 13.4(1 . 

(4.1) [z>W(z) = r™ = 6„r„ ~ n a -ip' n + n^J^j . 

Until further notice, assume that a £ {1/2,3/2, . . .}. (The contrary cases are 
considered later in this section.) We employ the Zigzag Algorithm outlined in 
Section 13.31 A compatible singular expansion for r^(z) is given by 

(4.2) rW (z) ~ ^r(a - ±)Z-«+3 + Z~ a+ iA + A. 
y ' w 20r 2 

If a > 1/2, then using (|3~TUjl . (|3~T3|) . and g^Jl in (E3J we obtain 

(4.3) M W (z) ~ + + + Z-^M + A 

2y7r(m — 1) 

whence, by singularity analysis, 

p>Wr n ~ "° r(Q ",^ ^ "' 1 +^ l ^ + n"- 2 AA + n-^ A A. 
2^/7r(m — 1)1 (a) 

The singular expansion for r n at ((3.4(1 then gives 

Ml 11 ~ , a w (a ~iw , n° + ^+n a 7V + n°-^ + nAr. 
(-oi)(m-l)r(a) 

On the other hand, if a < 1/2, the dominant term in ((4.2(1 is now the constant term 
so that 

m-2 

(4.4) rW(z)+ ^x J zJ~C a + -^r(a-i)Z- Q +5+^ + Z- Q +i_4, 
where 

m— 2 oo rn — 2 

(4.5) C Q :=rW(p)+ j>,y = £ p»n a r n + ]T a^'. 

^'—0 n— m — 1 

Then, using ((3.10(1 and 14.4(1 in ((2.5(1 we obtain 
(4.6) 

^ g0^ z _ 1/2+ a0 r( Q - |) Z -a +t4+z -a+^ +z - a+U+z l/2 A 

(m — 1)(— ai) 2- s /7r(?7i — 1) 
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whence singularity analysis and the singular expansion of r n yields 

(4-7) ^ ~ T^h n + ( a v ia ~ulf . n^+n^ + M + n^. 
[m—l)a\ (— ax) (m — 1)1 [a) 

If a G {3/2,5/2, . . .}, then logarithmic terms appear in the singular expansion 
compatible with 14.1fl . so that 

r-W ( z ) ~ ^=T{a - ±)Z- a +^ + Z~ a +^A + (log Z)A. 

This leads to 

(4.8) ~ T T ^~^\ z- a +Z- a ^A+Z- a+1 A+{Z- 1/2 \ogZ)A+(\ogZ)A 

2y>Tr{m — 1) 

and consequently 

Ml' 1 "or(a-y a+i +n a^ + n a-i Ar+(ralogn ^ 

— ai(m — 1)1 (a) 

Observe that the lead-order term and the order of growth of the remainder [0(|Z| _a+ 
in the expansion of ^(z) at (|4.8[) are the same as at (|4.3(l . 

Finally, we consider a = 1/2. Now, a singular expansion compatible with (|4.1|) 
is given by 

r-W (z) ~ In Z- 1 + C 1/2 + (Z log Z).A + Z.4, 
where the constant term is 



(4.9) 
Thus 

where 



OO , v 

n— m— 1 v v / 



m-2 



(z) + ^ - In Z- 1 + Cj /a + [Z log Z).4 + ZA, 



(4.10) C( /2 := C 1/2 + ^ ■ 

3=0 

Using (|2~5j> and lpTT0"|) . we get 

M W W „ -Z-^lnZ" 1 + a ° q/2 - Z^ 

(4.11) W 20F(m-l) -ai(m-l) 

+ (log Z)A + A+ (Z 1 / 2 log Z)A + Z^A 
Using singularity analysis and (|3.4|) we conclude 

(4.12) $ ^ -nlnn + m/2 n + n 1 / 2 ^ + (logn)M + M, 

—aiyjirym — 1) 

where 



2,/tt /anf-7 + 21n2) a C\ , 
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4.2. Higher moments. We will use induction to obtain asymptotics for higher- 
order moments. Throughout a' :— a + \. We consider the case a > 1/2 in 
Proposition ^. II and handle the remaining cases in Propositions 14. 21 and 14.31 

Proposition 4.1. Let a > 1/2. Then, for s > 1, and e > small enough, 

= D s Z- sa '+^ + 0{\Z\- sa '+^ +q ), 

where q := min{a — i, i} — e with 



and, for s > 2, 
(4.13) D s = 



«o 



(to - l)(-ai) 



= apTja- \) 
l " 2(m-l)VF 



1 f*\ r, r, r ( sa ' - !) 



2a ° feW J S ^ r(( s -iK-|)" 



Proof. We proceed by induction on s. For s = 1 the claim was proved as (14.3(1 
and Q4.8p. [Note that jj,^(z) = t(z) ~ a .] Suppose s > 2. We will first obtain the 
asymptotics of (z) at 1(2.6(1 by analyzing each of the terms in the sum there. 

Suppose exactly k > 1 of Si,...,s m , say si,...,Sk, are nonzero. Then, by 
induction, 

z m -V [fll] (*) • • V Sml (2) = 0(|Zr( s - s °) Q '+i). 

Moreover, if so = then the contribution to (z) is 0(\Z\-' a +i) unless fc = 1 
or k — 2. (Observe, however, that if k = 1 then so cannot be zero as that would 
imply 8\ = s.) On the other hand, if sq ^ 0, then using singularity analysis for 
polylogarithms [TU| and Hadamard products [7], we see that 

b Gsa {z)Q[z m ' 1 ^ sl] {z)---^(z)} = 0(\Z\- sa '+^ + ^), 

which is 0(|Z|~ sa +t~ e ) unless k — 1 and sq = 1. (The e term in the exponent 
avoids logarithmic factors that arise when —set' + 4^ + | is a nonnegative integer.) 

If all of si, . . . , s m are zero, then sq = s and, using (|3.12|) . the contribution to 
rW(z) is 0{\Z\- sa '+i+^) which is 0(\Z\- sa '+^). 

Hence unless sq = and exactly two of s±, . . . , s m are nonzero or sq — 1 and ex- 
actly one of si, . . . , s m is s— 1 in 1(2.6(1. the contribution to rM(z) is Od^l^ 15 ™ + 2~ £ ). 
In the former case the contribution to rM (z) is gotten by using the induction hy- 
pothesis as 

^jp^Z-^'^a^J^ {^) D j D s-j+0{\Z\~ sa ' +1+q )- 

In the latter case, again using the induction hypothesis and singularity analysis for 
Hadamard products we get the contribution to (z) as 

mp^a^sD^ J {sa! ~ 1} x Z-*<*' +1 + 0(\Z\-^'+^). 
r((s- l)a'- 5) 

Finally, noting that the contribution from X^jlc> 2 •^j 2 "' ^° ^ ne numerator on the 
right side in 1(2.5(1 is negligible, we complete the induction by using 1(3.5(1 and 13.10(1 . 

□ 
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For a < 1/2, it will be convenient to consider instead the "approximately cen- 
tered" random variable 



(4.15) X n := X n - — — ±f^(n + 1) = X n - — — + 1). 



(4.14) X n :=X„- ? + 1) = X„ - ? — — (n + 1 , 

(m — l)a^ (m ~ l)a* 

where C a is defined at (|4.5|l and a* at i|3.8fl . See (|4.7|) for the motivation behind 
this definition. The choice of centering by a multiple of n + 1 rather than n is 
motivated by the fact that with this centering X n satisfies the same distributional 
recurrence l|2.1[) as X n [with appropriate initial conditions (Xq, . . . , X m -2)]- We 
will use f[ s l(z), /}[ s l(z), and p$ to denote the analogous quantities for (X„). 

Proposition 4.2. Let a < 1/2. Then, for s > 1, and e > small enough, 

/iW (z) = £> s Z- sa '+2 + 0(|zr sQ ' +1 " e ) + c„ 

where c s is a constant and D s is defined as in Proposition ^. 1\ 

Proof outline. The basis of the induction is l|4.6|) and the induction step is identical 
to the one in the proof of Proposition l4.il We omit the details. □ 

When a = 1/2, wc define 

2a o C 'i/2 , P m ^C[/ 2 
{m-l)a\ {n+ >~ " (m-l)a*' 

The constant C[^ 2 is defined at Ij4.1t)|l using (|4.9|l . The key result here is the 
following. 

Proposition 4.3. Let a — 1/2. Define a m := —ai(m— l)/(\/2ao). Then, for 
s > 1, 

S 

fi^(z) = ~ ai <T m s Z- s +i CsA^Z- 1 )^ + 0(\Z\- s+1 -t), 

r=0 

where the constants C' Sir do not depend on m. 

Proof sketch. The form of the proof is the same as those of Propositions l4.ll and l4.2l 
The basis of the induction is H4.11fl . [Note that without the centering we have 
done we would be saddled with the Z -1 / 2 term, whose coefficient depends on m, 
in (|4.11|) .] For the induction step, in estimating fM(z), unless 

(1) so — 0, and exactly two of Si, . . . , s m are nonzero; or 

(2) so = 1) an d exactly one of si, . . . , s m is nonzero, 

the contribution is 0(\Z\~ s+ ^~ e ). 

In case (Q, by induction, the contribution to fW(z) is 

"tt (^)p m - 1 a^- 2 Z-" +1 £ F s , r (ln Z-y- r + 0(\Z\-'+i"), 



r=0 



where the (-F s ,r) do not depend on m. Using (|3.5|l and the definition of a m , we see 
that the constant multiplying the lead sum is 



p a Q = -ai<r m y ' 
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In case (j2J, the contribution is 

s-1 

where the (G s ,r) do not depend on m. Again, using l|3.5[l . the constant multiplying 
the lead sum is 

Summing all the contributions we get 

s 

r^(z) = -a 1 aJ s -^Z- s+1 J2H s A^Z- 1 ) s - r + 0(\Z\- s +^), 

where the (H Str ) do not depend on m. Recalling l|2.5[l and l|3.10fl . and the definition 
of a m once again, completes the induction. □ 

Remark 4.4. The significance of Proposition 14.31 is that the case m = 2 has al- 
ready been considered in [5], allowing us to determine the desired limiting distri- 
bution fTheorem 14. 6|) without computing the constants (C Sir ) in Proposition ^. 31 

4.3. Limiting distributions. We can now use the method of moments to derive 
limiting distributions for the additive functional. 

Theorem 4.5. Let a =/= 1/2, and let X n denote the additive functional that satisfies 
the distributional recurrence (|2.1I) with b n = n a . Define a' := a + |. 
(a) Ifa> 1/2, then 



a • 



(b) if a < 1/2, then 



(to - lXma*) 1 / 2 



(to — l)a* 



7 1 a i 



with C a defined at (|4.5|l and a* at (|3.8|) . 

In either case we have convergence of all moments, where Y a has the unique distri- 
bution whose moments are given by E Y^ = M s = M s (a) . Here 



Mi 



r(a-i) 



V2T(a) ' 
and, for s > 2, 

Proof. If a > 1/2, then by Proposition ^. II singularity analysis, and the asymptotics 
of r n at 13. 4L we have 

EX: = /4? = ? fw^f + 0(n-'-«). 

(-ai)r(sa' - 2) 

Define a = a m := — ai(m— l)/(\/2ao) = (to— l)(a*/m) 1 / 2 , where the last equality 
uses (|3.5|) . I|3.7|) . and (|3.8|l . Then, for fixed to, as n — > 00, 



14 



JAMES ALLEN FILL AND NEVIN KAPUR 



where, for s > 1, 



(-oi)r(aa' - i)' 



In particular, Mj = T(a — s)/[v^T(a)]. Furthermore, using l)4.13[l . we obtain the 
recurrence for M s . 

Convergence in distribution follows from the fact that (M s ) satisfies Carleman's 
condition, as has been established in 

The same proof holds if a < 1/2, now by considering X n defined at (14. 14ft and 
using Proposition 14. 21 □ 

The case a = 1/2 is covered by the following result. As alluded to in Remark l4.4l 
we will use the known results for m — 2 to derive the distribution for all to. 

Theorem 4.6. Let X n denote the additive functional that satisfies the distribu- 
tional recurrence (|2.1|) with b n = n 1 !" 1 . Define 



do(m) 



(to — l)a* ' 



where C' 1 , 2 is defined at (|4.10|) using (|4.9(l . Then, 



as n — > co, 









■ (n In n + 


V2n ^ 



(2 In 2 + 7 ) + V2^7 m do(«i)] «) } ^ Y 1/a , 



with convergence of all moments, where Y\ii has the unique distribution with mo- 
ments rrik '■= E Y"A 2 given by mo = 1, mi — 0, and for k > 2, 

1 r(fc - 1) 



E 



ki+k2+k3=k 
ki ,k 2 <k 



k 

fcl,fe 2 ,fe 3 



m kl m k2 



/27T 



k 3 



Jk u k 2 M + 4 \/ 2 kmk - 1 



Here 



Jki,k2,ks 



x k ^(l-x) k2 -i [x\nx+ {1- x)\n{l- x)] k3 dx. 



Proof. Recall the definition of X n = X n (to) at (|4.15|l . Using Proposition 14.31 
singularity analysis, and (|3.4I) . we see that 

s 

r—0 

where the (C Sjr ) do not depend on to. Thus, for all (integer) s > 0, 
E [a m X n (m)} s = E [a 2 X n (2)] s + 0(n s ^+ £ ). 
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It follows that 



= (ji In n + (2 In 2 + 7) + \f2%a m do (m) rij 



E \ <r m X n (to) p= [n In n + (2 In 2 + 7)71] 



E <| a 2 X„(2) - --L [n In n + (2 In 2 + 7 )n] }• 0( »*- 7+' ) 



= 2~ s/2 E jx„(2)- -^=7ilnn-Din| + 0(?r^ +£ ), 



the last equality using 172 = 2 1 / 2 and 



£>i := -=[21n2+7 + V5rdo(2)]. 

V7T 



Observe that 



-/)„ := [ -j=nh\n + Din ) - 



-=(n + 1) In (n + 1) + Di(n + 1) 

V 7T 



O(logn). 



Hence, for Z > 0, 



E 



X n (2) — nlnn — Z?in 

V7T 



E 



X n {2) - -=(n + 1) In (n + 1) - £>i(n + 1) + 5„ 



fe=0 



= 5] ( S )Kn fc + (n fc )]0((logn) s - fc ) 



[to s + o(l)]n s , 



where 



771^ := lim E < n 1 



X„(2) - -=(n + 1) In (n + 1) - Di{n + 1) 



But the to/j's have already been determined [HJ Proposition 3.8] and the claim 
follows from there. □ 

5. The shape functional 

Recall that the shape functional X n is the additive functional on m-ary search 
trees induced by the toll 

K = In ( n ] , 

\TO — 1/ 

with (JTq, . . . , X m _2) = 0. By l|2.5|l . we have 

rW(z) 



(5.1) 



1 — m[zr(z)] 



m—1 ' 



with r^(z) given by H2.6|) . where 

z — ' , \TO — 1 
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It follows from Theorem 1 of ^Hl that b(z) is amenable to singularity analysis. In 
the sequel we will make use of the following asymptotic expansion of b n as n — ► oo: 

(5.2) b n ~ (to- l)lnn - ln[(m - 1)!] +nr 1 N. 

5.1. Mean. Using (13. 14ft and (|5.2I) we have 



[zV 1] (z) ~ » _3/ V" 



(—ai)(m — l), , r , ... / — cii 
^ i Inn - ln[(m - 1)!] ' - 

2V7T 



+ (n" 1 In n)M+n- l M . 

A compatible singular expansion can be computed using (|3.15|) : 
r^(z) ~ C ln - (-oi)(m - 1)^ 1/2 ln^- 1 

- {(-oi)(m - 1)[2(1 - In 2) - 7] - (-ai) In[(m - 1)!]} Z 1 / 2 
+ Z.4 + (Z 3/2 log Z).A + Z 3/2 A, 

where 



C ln :=rW(p)= £ P r ' 



In 



m — 1 



Now using H5.1fl and l|3.1L)[l we get 



(-ai)(m - 1) 



z -l/2 



(5.3) 



ao In Z 1 — ao 



(2(l-ln2)- 7 )- 



ln[(m- 1)!] 



m — 1 

+ (Z 1 / 2 log Z).4 + Z 1 / 2 ^ + ZA + (Z log Z)A, 
whence singularity analysis and 1)3. 4J) yield 



(m - 2)C] 



In 



3(m- 1) 



2a C] 



(to 



- 2y/n ( \ n 1 / 2 + (logn)J\f + N + n~ 1/2 N. 



5.2. Second moment and variance. As in the case of the toll n a when a < 1/2, 
it will be convenient to consider the random variable 



where here 



X n :=X„-di(n + l), 
2a Cin 



(m — l)af 

Thus, A n satisfies the same distributional recurrence 1)2.1(1 as A„ with initial condi- 
tions (Ao, . . . , X m —2) = — di(l, . . . , m— 1). Again, we use fM (z), (z), and /i|f' to 
denote the analogous quantities for X n . Then, noting that E X n = E X n — di(n+l), 
a singular expansion for //W(z) can be obtained using (|5.3II . namely, 



(5.4) 
where 



-ao 



InZ- 1 -d 2 + 0(\Z\i~ e ), 



d 2 := a 



(2(l-ln2)-7)- 



ln[(m- 1)!] 

TO — 1 



(m - 2) Ci n 
3(m- 1) 



+ (ao - a 2 )di 
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We begin the variance computation by obtaining asymptotics for the second 
moment jl n . To that end, we calculate the contribution of the terms in the sum 

2 



r [2] ( Z )= £ 



soH hs m =2 

si,...,s m <2 



SO, ■ 



When so = 0, exactly two of si, . . . , s m equal 1. The contribution to (z) from 
such terms is 



m—lm—2 



(z) 

mim-^p" 1 - 1 ^ 1 - 2 [al\n 2 Z- 1 + 2a a d 2 \nZ- 1 + dl] + 0{\Z\?~ 2e ). 



When s — 1, exactly one of Si, . . . , s m equals 1. The contribution to (z) from 
such terms is obtained (after some routine calculations using the Zigzag algorithm) 
using the expansion for Li^i at l|3.15fl a $ 



2mb{z) 



m — 1 



In 2 Z- 1 + ( 7 (m - 1) + ln[(m - 1)!]) bxZ' 1 



+ constant + 0{\Z\^~ e ). 

Finally, when sq — 2 the contribution to A^(z) is b Q2 (z) [z" l " 1 T m (z)] , which 

equals constant + 0(\Z\i~ e ). 

Summing these contributions and using l|5.4|l and (|3.5fl . we conclude (note the 
cancellation of the ostensible lead term) that 



where 



fl 2 l(z) = 4a (™ - 1)(1 - ln2) lnZ" 1 + d 3 + 0(|^|*- £ ), 
r [2] (z)~4a (m-l)(l-ln2)lnZ" 1 . 



d 3 := lim 

2-»p 

This leads, using l|3.10[l . to 

4a 2 



d 3 a 



(5.5) A [2l W = ^ L (l-ln2)^- 1 / 2 lnZ- 1 + 

-ai {-ax)[m-l) 



z- 1 ' 2 + o(|zr e ). 



By singularity analysis 



where 



P Pn 'n 



c?4 : = 



4a 2 



'irai 



-(1 - ln2)n" 1/2 Inn + d^rT 1 ! 2 + 0(n" 1+e ), 



4a 2 



'7TOl 



-(l-ln2)(7 + 21n2) + 



d 3 a 



^(-ai)(m - 1) 



Using the asymptotics of r„ at l|3.4|l we get 



A!? = 8 



(1 - ln2)nmn + h/^± n + 0(n^ +e ). 



-ai 
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Thus 

VarX n = VarX n = j$ - 



^V(l-hi2)nlnn+(^ 4 4 ™< 

5.3. Higher moments and limiting distribution. 
Proposition 5.1. For s > 2 and e > small enough, 



^„ + 0(„H«). 
a i J 



Ls/2J 

(z) = ^ C s>j (ln^J-^-M +0(\Z\ 

3=0 



(5.6) C 2 i,o = -5— V ( .WoC 2i - 2j . , />2; C a ,o = — (1 - 1*2). 

-2ai ^ V2j/ -ai 

Proof sketch. We proceed by induction. For s = 2 the claim is true by l|5.5|l . 
[Recalling l|5.4|) , we note in passing that the claim is not true for s = 1 .] In the rest 
of the proof we will explicitly compute C s j only when s is even and j = 0. The 
rest of the coefficients will appear as unspecified constants. 

For the induction step consider s > 3. In a manner analogous to the proof of 
Proposition 14.11 we obtain the asymptotics of f M (z) by first analyzing the contri- 
butions of the terms in the sum (|2.6|l . As in that proof one can check that unless 

(a) so = and exactly two of Si, . . . , s m are nonzero, or 

(b) so = 1 and exactly one of s±, . . . , s m is s — 1 

in (ESJ), the contribution to f [s] (z) is 0(\Z\-i+i- ( - s+1 ^). 
In case |jaj, the contribution to fM(z) is 

(5.7) (J)a w w^w. 

In this sum unless j is 1 or s — 1, the induction hypothesis implies a contribution 
of 

(5.8) 



L j/2J + L^J , N 

™)p m - 1 aZ- 2 Z-% +1 {j) As <i< 1 (ln Lj/2j + L£ ^ J ^ Z-^+0(\Z\-i + ^-^), 

with j4 s ,j,o = CjfiCs—jfl- Notice that when s is even and j is odd, this contribu- 
tion is O ^ln'- s / 2 J — 1 Z -1 ^. In all other parity cases the contribution is 

o(|Z|-f +1 ln Ls/2j Z- 1 ). 

On the other hand the total contribution to f ^ (z) from the terms where j is 1 
or s — 1 in the sum in (|5.7H is obtained using the induction hypothesis and (|5.4() as 

(5.9) -2 S r t )p m - 1 a "- 1 Z-f+ 1 ^ £ s _ u lnL^J +1 -.>' Z -i + i+i" 2e ), 



where £ s _i.o = C s -i,o- 
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In case (jbj, the total contribution to fW(z) is 

msb(z) [z m - 1 T m - 1 (z)^ s -^{z)\. 
Now another application of the Zigzag algorithm yields this contribution to be 

(5.10) sm(m-l)p m - 1 a™- 1 Z-i +1 ^ D Stj ln^ i+1 - j Z' 1 + 0(\Z\-i+^ 2t ), 

3=0 

where D s ^ — C s _i o- Note that the lead term here is exactly the same as in (|5.9f) 
but with opposite sign, so that l|5.9|) and l|5.10|l cancel each other to lead order. 
Summing the various contributions, we find that f^(z) is of the form 

L*/2J 

fW(z) = Z-4 +1 J2 C S J\^ S ' 2 ^ Z- 1 ) + O (\Z\-i+i- 



where, for s even, 



C. 



s,0 



3=0 



p m l a™ 2 ^2 ( - \ C 3flCs-i$- 

0<j<s v'' 
j even 



Using (|3.10|) . the result follows. 



□ 



It follows from Proposition 15. II singularity analysis, and the asymptotics of r n 
at (EHj) that 



ft! 



-oir(^i) 

whence for s > 1, as n — > oo, 



\-n s ' 2 ln^J n + 0{n s / 2 ln^J-i n), 



E 



\/ n In / 



2y / 7rC l 2s! 

( _ ai) r( s - 1) 



and E 



V?iln? 



ol) 



Solving the recurrence Ij5.6|) for C2S.0 yields 
C2s, = 



a 1 r(^i) (2s)! 2s _ - ai (2s)!(2s-2)! 2s 



2 s s! 2 2 s 2 2s - 2 s!(s- 1)! 

where c 2 := 8 (a /a!) 2 (1 — In 2). The method of moments (see, for example, ^ 
Theorem 30.1]) implies then that the shape functional is asymptotically normal. 

Theorem 5.2. Let X n denote the shape functional for uniformly distributed m-ary 
search trees on n keys. Then 

X n-^ + 1 l^ mo 2 ) and *n~EX w c Ar(n1) 



whe 



\/VarX„ 



2a 



(m 



1)0? ^/ 

y 1 71=771—1 



In 



n 

m — 1 



and a 2 := 8 (a /ai) 2 (1 — In 2). 



20 



JAMES ALLEN FILL AND NEVIN KAPUR 



Remark 5.3. It is known 0E] that under the random permutation model the shape 
functional centered by its mean and scaled by its standard deviation is asymptoti- 
cally normal for 2 < m < 26 and does not have a limiting distribution for m > 26. 
In contrast, under the uniform model we have asymptotic normality for all m > 2. 



6. The space requirement 

The space requirement for rn-ary search trees is the number of nodes in the 
tree |17| . (For a binary search tree the space requirement for n keys is clearly n, so 
in this section we assume m > 3.) The limiting distribution of this parameter under 
the random permutation model has been considered by several authors |18 |ll6l l2*|l$]. 
In our framework it is the additive functional X n corresponding to the toll l n > m -i 
with initial conditions (Xo, . . . , X m -%) = (0,1, ... , 1). 

The sth moment /jM(z) := EIJ can be computed as usual using (|2.5J) . where 
now 

(6.1) rW(z) = z m - 1 y ( " V Sl] (>)---M lSmI (*0 

since the toll generating function b(z) = (1 — z) -1 serves as the identity for Hadamard 
products. 

6.1. Mean. Substituting s = 1 in l|6.1|) and using l|3.1|) and l|3.3|) yields 

m — 2 m— 2 

rW( Z ) + ^z^ Z m - 1 r m (z) + ^ zi = r(z)-l~ (a () ~l)+a 1 Z 1 / 2 + ZA+Z 3 / 2 A. 

3 = 1 3 = 1 

Then, by O and (|3~TU|) . 

a 



(6.2) ^l(z)~ °o(°o-l) z -i/ 2 . 

— ai(m — 1) 



c (a - 1) 



Z 1/2 „4 + ZA 



771 — 1 

Singularity analysis and the asymptotics of r„ immediately lead to 

tfi q^ [i] 2a o( a o " !) . « /■ 

(6.3) ^N^ ____„+ M 

6.2. Variance. As for the shape functional it is convenient to consider instead the 
"centered" functional X n := X n — d\{n + 1), where now 

2ao(ao — 1) m(l — pm™-- 1 ) 
1 a 2 (m — 1) (m — l)a* 

The "centered" space requirement satisfies the same distributional recurrence as 
the space requirement with initial conditions 

{X , . . . ,X m - 2 ) = -(di,2di - 1, .. . , (m - l)di - 1). 

We employ the same notation as Section|31 with fM(z), fl^(z), and p$ denoting 
quantities analogous to r^(z), fi^(z), and /4i . 
By definition 

oo 

(6.4) ~^\z) = fi^(z) - ^ J2(n+l)T n z n . 

n=0 
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Now 



(6.5) >T(n + 1)t„z b = zr'(z) + t(z) ~ -^I^ 1 / 2 + (a - a 2 ) + Z X ' 2 A + ZA. 



n=0 



Thus using iftHfy. and (JH^J) we find 



(6.6) 

where, using l|3.9|l . we have 
(6.7) 



(z) - Si + z 1/2 ^+za 



Bi := c (a - 1) ^— - di(oo - a 2 ) = 

m — 1 m — 1 



Using 
(6.8) 

^(z) = z" 1 " 1 
Also, 

(6.9) 
By 

and, for fc > 1, 
so that 

Similarly 



i(m - 1) (fl [1] (z)y T m ~ 2 (z) + 2mil^{z)T m - 1 {z)+T m (z) 



3=0 



3=1 



r fc (z) ~ ag + fca^VZ 1 / 2 + + Z 3 / 2 ^, 
(z)) 2 t" 1 ~ 2 (z) ~ a™- 2 S 2 + Z 1/2 ^ + ZA. 



{z)T m - 1 {z) ~ a™- 1 ^ + Z 1 / 2 A + ZA. 



Using these expansions and 13. 5|) in l|6.8|l gives [recalling l|3.13[l and (|6.9|l ] 



ft 2 



(z) + ^ or;: 



whence 

(6.10) 

where 
(6.11) 

B 2 := - 



M [2 



777 — 1 9 an 

£? 2 + 2i?i H — - + Si 

Oq 777, 



( 2 )~b 2 z- 1 / 2 + ^ + z 1 / 2 a 



Z 1/2 A + ZA, 



«0 



ai(m — 1) 



m — 1 o „ an 
S 2 + 2Bi + — + Si 

ag 777 



«0 



-ai(m — 1) 



Si 



a 



777(777 — 1) 



By singularity analysis and the asymptotics of r„ , then 

a® 



-«i 



Recalling (|6.3|) , we observe that ~ A/" so that 



(6.12) 



Var X B = Var X B = /^ 2l - (X 11 ) 2 ~ + 
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We pause here to remark that since Var X n — > oo as n — > oo for m > 3 (see 
Remark IA.2I in the Appendix), we must have B2 > 0. We do not know a direct 
proof of this fact. 

6.3. Higher moments and limiting distribution. 
Proposition 6.1. For s > 1, 

jl [s] {z) = B s Z-i+i + 0{\Z\~i +1 ), 
where B\ and B2 are given at (|6.7[) and (|6.11|) . respectively, and, for s > 3, 



(6.13) B s = 



—a\{m — 1) 



m — 1 v /s\ „ „ „ 



Proof Sketch. We proceed by induction. For s = 1 and s — 2, the result has been 
established at (|6.6|l and (|6.1U|I . respectively. 

Suppose s > 3. We will first obtain a singular expansion for f^(z) by analyzing 
the contributions of the terms in the sum at (|6.1|) . As in the proofs of Proposi- 
tions 14.11 and 15.11 unless 

(a) so = and exactly two of s\, . . . , s m are nonzero, or 

(b) so = 1 and exactly one of s%, . . . , s m is s — 1, 

the contribution to fM(z) is 0(\Z\~i + ^). 
In case ijsj, the contribution to r^(z) is 

^p-^- 2 ^ f+1 E (^Bjfl.-j + OflZI-i+i). 

In case (0, the contribution to fl s l(z) is 

mp m " 1 sa ri - 1 S s _ 1 Z-f + 1 + 0(|Z|-*+i) 

This leads to 

whence ()2.5|) and (|3.1Q(I complete the induction. □ 

Now using H6.13[) and (|6.7|l we have B3 = and by induction £?2s+i = for 
s = 1, 2, Then [compare l|5.6|l ] 

1 /2Z\ 

with B2 given at (|6.11|l . Now following the development leading to Theorem 15.21 
we can conclude asymptotic normality for the space requirement. 

Theorem 6.2. Let X n denote the space requirement for uniformly distributed fri- 
ary search trees on n keys. Then 

X n -d x {n + l) c ,w„ 2 N , X n -EX n c , r ,„ ^ 

7= ► N{0,a ) and ► N(0,1), 

\/n V Var X n 
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whe 



di 



m(l-pm^) 2 B 2 
and <t = 2 



(m — l)a* — ai 

.Here -B2 is given by luii/i <5i defined at l|6.9l) and ao and ai at <|3.5I) an <^ (|3.7[) . 

respectively. 

7. Number of leaves 

Lastly we consider the number of leaves in an m-ary search tree. This is 
the additive functional X n corresponding to the toll l n=TO _i with initial condi- 
tions (Xq, . . . , X m -2) — (0, 1, • • • , 1). Under the random permutation model, the 
number of leaves is asymptotically normal (UJ Theorem 2.5]. We will establish an 
analogous result (Theorem l7.2f> under the uniform model. 

The toll generating function is now b(z) = z m ~ l . Note that 



b es (z) - 



(1-z) 

„m— 1 



- 1 s = 0, 

s > 1. 



Thus in JUJ 



(7.1) 



rW(z) 



«iH hs m =s 

si ,...,s rTl <s 



Given the similarity of the calculations with those of Section [U we will be brief. 
The interested reader is invited to flesh out the development of this section along 
the lines of Section H3 

7.1. Mean. Substituting s = 1 in 1|7.1|) and using i|2.5|) we get 

+ ^± + Z^A + ZA 

—aim™- 1 3m™- 1 



so that 



u ll] 

Pn 



2a n 



-JV~ —n +Af. 

a* 



7.2. Variance. For the variance we consider again X n := X n — d\(n + 1), where 
now d\ := -^ r . Then 

where [compare <|6.7ll ] B\ = 0. Substituting s = 2 in (|7.1|) we get 

2 



fl 2 l(z) = z r ' 
Thus 

where 
(7.2) 



l + m(m-l) (m [1] (2)) t" 1 - 2 (z) 



P 



- 1 + ZA + Z 3/2 A. 



^(z)^B 2 Z- 1 l 2 + A + Z 1 / 2 A 1 



Bo 



Si) 



— ai(m — 1) 

with 5i defined as at (|6.9|) (but now with d\ = p/a*). This leads to 
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Clearly B2 > 0, and therefore Var X n grows at an asymptotically linear rate. 

7.3. Higher moments and limiting distribution. The analog of Proposition Rm 
is the following. We omit the proof. 

Proposition 7.1. For s > 1, 

pft{z) = B s Z-i + ^ +0(|Z|-i +1 ), 
where B\ — 0, B2 is given at l|7.2[l . and, for s > 3, 



It follows easily from Proposition l7.1l that the number of leaves is asymptotically 
normal. 

Theorem 7.2. Let X n denote the number of leaves in a uniformly distributed un- 
ary search tree on n keys. Then 

V '-^N(0,a 2 ) and "__ n ± N(0, 1), 



where 

a 



2 pm— (p™- 1 + 
a*(m — 1) 



Appendix A. Growth of variance for nondegenerate functionals 

In Section 16.21 we claimed that the variance of the space requirement tends to 
infinity as n — > 00 for m > 3. In this appendix we identify additive functionals that 
are degenerate, i.e., for each fixed n have the same value for all m-ary search trees 
with n keys, and provide a lower bound on the rate of growth of the variance of 
any nondegenerate additive functional. 

Theorem A.l. Consider an additive functional (X n ) n >o [as at l|2.1|l ] with toll (6 n )n>0; 
with initial conditions (xq, . . . , x m -2) — (bo, ■ ■ ■ , &m-a)- Then the following state- 
ments are equivalent: 

(a) C(X n ) is degenerate for every n > 0. 

(b) The toll satisfies 




nb\ — (n — l)bo, n = 2, . . . , m — 2 

(m-l)(6i -2&o), n>m-l. 



(c) X n = nbi — (n — l)&o for every n>0. 

Moreover, if fljj) does not hold then := VarX n = f2(logn) as in 00. 

Remark A. 2. Before we prove the theorem we apply it to the space requirement. 
It is easily checked that, for this additive functional, condition Jb|) holds only when 
m = 2. Thus the space requirement is not degenerate (and its variance tends to 
infinity as n — ► 00) for m > 3. 
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Proof of Theorem \A.l\ We will show below that {b|) and (jgj) are equivalent. 

To show the equivalence with suppose first that (Jnj holds, so that X n — x n 
deterministically for all n > 0. Then 

(A.l) cc n+ ( m _i) = x n + (m - 1)xq + 6 n+ ( m _i) for n > 

= a; n _i + ari + (to - 2)x + & n+ ( m _i) for n > 1, 

and so = i„-i + (ari — #o) = Xn-i + (bi — bo) for n > 1. Condition Q then follows 
by induction. Conversely, condition Q trivially implies ij&jl. and the equivalent (0 
shows that X n is indeed an additive functional. 

We now show the equivalence of (JbJ) and @. If Q, holds then so does <|A.1|) 
[because (X n ) = (x n ) is an additive functional], from which [by solving for 6„-|-( m _i)] 
it is easy to check that (0 holds. Conversely, if 10 holds, then (jcjl is trivially true 
for n = 0, . . . , to — 2, and holds by induction for n > to — 1. 

Suppose now that (b n ) does not satisfy 0. Let n be such that £(X„ ) is not 
degenerate. Then for any n > n\ := hq + (to — 1), there is positive probability of 
having a subtree containing precisely no keys, so that by the law of total variance 
we must > 0. 

Finally we show that if C(X n ) is nondegenerate for all n > m, then af t = Sl(logn) 
as n — > oo. First, it is clear that in this case there exists e > such that ofj > e 
for all n £ [n±, mn\ + (to — 2)]. Now suppose n > mri\ + (to — 1). Then at least 
one subtree must have size in the range [m,n— (to — 1)] C [n±,n— 1], so that by 
induction and the law of total variance we have > e for all n > n\. 

For n > m + (to — 1), let p n denote the probability that a tree of size n has its 
first subtree of size n — n\ — (to — 1), its second of size n\, and the rest of size 0. 
Then by 12.2J1 and the asymptotics of r„ (see Remark I3.1JI . as n — > oo we have 
_ r ra _„ 1 _ (m _ 1) T rai „ 1+ ( m _i) n 

T„ 

Also p„ > for n > ni + (to — 1), and so 5 := inf{p„ : n > n\ + (to — 1)} > 0. 

Define ao :— n\, a\ :— mni+{m— 1), and, for k > 2, ctk := mak-i+ni + (m—l). 
We will show, for each k > and any n > at, that cr^ > (1 + fc<5)e, and the 
logarithmic-growth claim follows. 

For k = the result has been shown above. For k — 1, using the law of total 
variance and p n > 8 the result follows (compare the case k > 2 to follow) . Suppose 
k > 2 and n> a^. Then n— (to— 1) > mak—i, so there must be at least one subtree 
of size at least a>k-i- Hence a\ > (1 + (fc — l)<5)e. But with probability p n > <5, the 
first two subtrees are each of size at least ri\\ then by the law of total variance we 
have ofj > [1 + (fc — l)6]e + 5e = (1 + fc5)e, as desired. □ 
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